246 research outputs found
Neural Video Compression with Diverse Contexts
For any video codecs, the coding efficiency highly relies on whether the
current signal to be encoded can find the relevant contexts from the previous
reconstructed signals. Traditional codec has verified more contexts bring
substantial coding gain, but in a time-consuming manner. However, for the
emerging neural video codec (NVC), its contexts are still limited, leading to
low compression ratio. To boost NVC, this paper proposes increasing the context
diversity in both temporal and spatial dimensions. First, we guide the model to
learn hierarchical quality patterns across frames, which enriches long-term and
yet high-quality temporal contexts. Furthermore, to tap the potential of
optical flow-based coding framework, we introduce a group-based offset
diversity where the cross-group interaction is proposed for better context
mining. In addition, this paper also adopts a quadtree-based partition to
increase spatial context diversity when encoding the latent representation in
parallel. Experiments show that our codec obtains 23.5% bitrate saving over
previous SOTA NVC. Better yet, our codec has surpassed the under-developing
next generation traditional codec/ECM in both RGB and YUV420 colorspaces, in
terms of PSNR. The codes are at https://github.com/microsoft/DCVC.Comment: Accepted by CVPR 2023. Codes are at https://github.com/microsoft/DCV
Sliding at first order: Higher-order momentum distributions for discontinuous image registration
In this paper, we propose a new approach to deformable image registration
that captures sliding motions. The large deformation diffeomorphic metric
mapping (LDDMM) registration method faces challenges in representing sliding
motion since it per construction generates smooth warps. To address this issue,
we extend LDDMM by incorporating both zeroth- and first-order momenta with a
non-differentiable kernel. This allows to represent both discontinuous
deformation at switching boundaries and diffeomorphic deformation in
homogeneous regions. We provide a mathematical analysis of the proposed
deformation model from the viewpoint of discontinuous systems. To evaluate our
approach, we conduct experiments on both artificial images and the publicly
available DIR-Lab 4DCT dataset. Results show the effectiveness of our approach
in capturing plausible sliding motion
Memory-and-Anticipation Transformer for Online Action Understanding
Most existing forecasting systems are memory-based methods, which attempt to
mimic human forecasting ability by employing various memory mechanisms and have
progressed in temporal modeling for memory dependency. Nevertheless, an obvious
weakness of this paradigm is that it can only model limited historical
dependence and can not transcend the past. In this paper, we rethink the
temporal dependence of event evolution and propose a novel
memory-anticipation-based paradigm to model an entire temporal structure,
including the past, present, and future. Based on this idea, we present
Memory-and-Anticipation Transformer (MAT), a memory-anticipation-based
approach, to address the online action detection and anticipation tasks. In
addition, owing to the inherent superiority of MAT, it can process online
action detection and anticipation tasks in a unified manner. The proposed MAT
model is tested on four challenging benchmarks TVSeries, THUMOS'14, HDD, and
EPIC-Kitchens-100, for online action detection and anticipation tasks, and it
significantly outperforms all existing methods. Code is available at
https://github.com/Echo0125/Memory-and-Anticipation-Transformer.Comment: ICCV 2023 Camera Read
cRedAnno+: Annotation Exploitation in Self-Explanatory Lung Nodule Diagnosis
Recently, attempts have been made to reduce annotation requirements in
feature-based self-explanatory models for lung nodule diagnosis. As a
representative, cRedAnno achieves competitive performance with considerably
reduced annotation needs by introducing self-supervised contrastive learning to
do unsupervised feature extraction. However, it exhibits unstable performance
under scarce annotation conditions. To improve the accuracy and robustness of
cRedAnno, we propose an annotation exploitation mechanism by conducting
semi-supervised active learning with sparse seeding and training quenching in
the learned semantically meaningful reasoning space to jointly utilise the
extracted features, annotations, and unlabelled data. The proposed approach
achieves comparable or even higher malignancy prediction accuracy with 10x
fewer annotations, meanwhile showing better robustness and nodule attribute
prediction accuracy under the condition of 1% annotations. Our complete code is
open-source available: https://github.com/diku-dk/credanno.Comment: 5 pages, 5 figures, 2 tables. arXiv admin note: text overlap with
arXiv:2206.1360
Parameter-free Dynamic Graph Embedding for Link Prediction
Dynamic interaction graphs have been widely adopted to model the evolution of
user-item interactions over time. There are two crucial factors when modelling
user preferences for link prediction in dynamic interaction graphs: 1)
collaborative relationship among users and 2) user personalized interaction
patterns. Existing methods often implicitly consider these two factors
together, which may lead to noisy user modelling when the two factors diverge.
In addition, they usually require time-consuming parameter learning with
back-propagation, which is prohibitive for real-time user preference modelling.
To this end, this paper proposes FreeGEM, a parameter-free dynamic graph
embedding method for link prediction. Firstly, to take advantage of the
collaborative relationships, we propose an incremental graph embedding engine
to obtain user/item embeddings, which is an Online-Monitor-Offline architecture
consisting of an Online module to approximately embed users/items over time, a
Monitor module to estimate the approximation error in real time and an Offline
module to calibrate the user/item embeddings when the online approximation
errors exceed a threshold. Meanwhile, we integrate attribute information into
the model, which enables FreeGEM to better model users belonging to some under
represented groups. Secondly, we design a personalized dynamic interaction
pattern modeller, which combines dynamic time decay with attention mechanism to
model user short-term interests. Experimental results on two link prediction
tasks show that FreeGEM can outperform the state-of-the-art methods in accuracy
while achieving over 36X improvement in efficiency. All code and datasets can
be found in https://github.com/FudanCISL/FreeGEM.Comment: 19 pages, 9 figures, 13 tables, Thirty-Sixth Conference on Neural
Information Processing Systems (NeurIPS 2022), preprint versio
- …